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Abstract 

microRNAs (miRNAs) are small endogenous non-coding RNAs that function as the universal specificity factors 
in post-transcriptional gene silencing. Discovering miRNAs, identifying their targets and further inferring miRNA 
functions have been a critical strategy for understanding normal biological processes of miRNAs and their roles in 
the development of disease. In this revlev^, we focus on computational methods of inferring mIRNA functions, includ- 
ing miRNA functional annotation and inferring miRNA regulatory modules, by integrating heterogeneous data 
sources. We also briefly introduce the research In miRNA discovery and miRNA-target identification with an em- 
phasis on the challenges to computational biology 
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INTRODUCTION 

The genetic material of an organism, or genome [1], 
plays a central role in encoding both the cellular 
fabric and the regulatory machinery that controls 
cell homeostasis and internal functions, such as 
DNA replication and response to environmental sig- 
nals. While the genome is encoded by DNA, the 
complex biological processes derived from genome 
involve a myriad of interacting and co-functioning 
RNA molecules and diverse protein structures. 
These co-functioning groups of molecules described 
as gene regulatory modules are essential components 
in biological systems. In order to understand the 
composition of these modules and their roles in an 
organism, detailed investigation of gene structures. 



functions and activities must be determined within 
individual cells and in various tissues throughout de- 
velopment. Hov^ever, since gene structure and func- 
tion are relatively constant from one cell to another 
or from one species to another, it is the patterns of 
gene expression and its regulation or dysregulation 
that have the greatest consequence in normal biology 
and diseases. 

While gene expressions can be influenced by 
many factors, post-transcriptional gene regulation 
involving microRNAs (miRNAs) is particularly 
fascinating because of the breadth of their inter- 
actions facilitated by their synergistic/combinatorial 
relationships with target genes. miRNAs are charac- 
terized by a growing class of ~22 nt long 
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non-protein-coding RNAs [2, 3]. They are ex- 
pressed from longer transcripts encoded in animals, 
plants, viruses and single-celled eukaryotes. miRNAs 
are also an attractive topic for system modelling and 
computer science because of their roles as guide 
strands for niRNA degradation and translational in- 
hibition to a large extent through the logic of com- 
plementary base pairing [4]. 

Increasing evidence suggests that miRNAs are 
pivotal regulators of development and cellular 
homeostasis through their control of diverse biolo- 
gical processes. miRNAs regulate target mRNAs and 
make fine-scale adjustments to protein outputs. 
Consequently, dysregulation of miRNA functions 
can lead to human diseases. Recent studies have re- 
ported differentially regulated miRNAs in diverse 
cancer types, such as breast cancer [5], lung cancer 
[6], prostate cancer [7], colon cancer [8], ovarian 
cancer [9] and head and neck cancer [10]. 
miRNAs are also implicated in a number of neuro- 
logical disorders including Alzheimer's disease [11], 
multiple sclerosis [12] and schizophrenia [13]. Thus, 
identifying miRNAs, targets and their functional 
regulatory networks are critical in understanding 
normal biological processes of miRNAs and their 
roles in the development of disease [14]. 

Great efforts have been made to discover miRNAs, 
identify miRNA targets and infer miRNA functions 
with both biological methods and computational 
approaches in recent years. These endeavours have 
drastically increased the amount of miRNA and 
niRNA data at both expression and sequence 
levels. However, it is unfeasible to explore all the 
complexity and diversity of miRNAs and their targets 
empirically with biological methods in a combinator- 
ial matrix due to the laborious tasks involved. 
Fortunately, computational methods shed lights on 
biological research [15] as they facilitate experimental 
validation by producing statistically significant 
hypotheses fi-om the large amount of biological 
measurements. 

In this review, we briefly address bioinfomiatics 
approaches to miRNA discovery and target identifi- 
cation with an emphasis on the challenges to com- 
putational biology as these methodologies have been 
extensively reviewed elsewhere [16—20]. More at- 
tention wiU be devoted to computational methods 
of miRNA functional annotation and inferring 
miRNA regulatory modules (MRMs). This exciting 
and challenging new development in integrated gen- 
omics has been the potential to provide more robust 



and tangible functional annotation of miRNA and 
miRNA-associated gene networks. 

miRNA DISCOVERY 

miRNAs were first identified through genetic ap- 
proach in the Caenorhabdids elegans through research 
investigating heterochronic mutants that affect 
developmental timing. One of these genes, lin-4, 
did not encode a protein but contained a small seg- 
ment of homology to multiple motifs in the 
3'-untranslated region (3'-UTR) of another hetero- 
chronic gene lin-14 which does encode protein [21]. 
The lin-4 sequence was poorly conserved and for 
some years this appeared to be an isolated case 
until the discovery of another miRNA gene, again 
in C. elegans, known as let-7. The broad conservation 
among metazoans created significant excitement 
about let-7 and the prospect of miRNA generally 
that led to a rapid discovery process both through 
molecular cloning and bioinfomiatic approaches 
[22] . Both of these discoveries were also enhanced 
by developments in our understanding of the bio- 
chemistry of RNA interference and miRNA 
biogenesis. 

Briefly, we now know that miRNAs are initially 
produced in the nucleus as long primary transcripts 
(pri-miRNAs) by RNA polymerase II, typically 
from their own non-coding gene or from the introns 
of protein-coding genes. The pri-miRNAs fold into 
hairpins, which bind to two members of the RNase 
III families of enzymes, Drosha and Dicer. Drosha 
forms the microprocessor complex with DGCR8 in 
the nucleus and cleaves the primary transcript to Hb- 
erate the ~70nt miRNA precursor (pre-miRNA) 
hairpin. After being exported to the cytoplasm by 
exportin-5, dicer further processes the transcript to 
produce the mature ~20bp miRNA/miRNA* 
duplex. miRNA discovery approaches, both biolo- 
gical and bioinfomiatics, have now yielded many 
thousands of miRNAs. This process continues with 
new miRNA appearing daily in various databases 
and compiled officially as the niiRBase (http:// 
wvvw.niirbase.org/) [23], which is the primary 
online repository for published miRNA sequence 
and annotation (stored in miRBase database) as 
well as for novel miRNA genes prior to publication 
(stored in miRBase registry). Each entry in the data- 
base represents a predicted hairpin portion of a 
miRNA transcript with information on the location 
and sequence of the mature miRNA sequence. 
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With bioinfomiatic methods, putative miRNAs 
are first predicted in genome sequences based on 
the structural features of miRNA. These algorithms 
essentially identify hairpin structures in non-coding 
and non-repetitive regions of the genome that are 
characteristic of miRNA precursor sequences. The 
candidate miRNAs are then filtered by their evolu- 
tionary conservation in different species. Known 
miRNA precursors play important roles in searching 
algorithms because structures of known miRNA are 
used to train the learning processes to discriminate 
between true predictions and false positives [16, 24]. 
Many algorithms, for example, miRScan [25], 
miRSeeker [26], miRank [27], miRDeep [28], 
miRDeep2 [29] and miRanalyzer [30], have been 
proposed. Once predicted, experimental techniques 
such molecular cloning, sequencing or hybridization 
are typically used to validate the predictions. 

These approaches have also led the discovery pro- 
cess, with experimental methods particularly high- 
throughput sequencing producing a small RNAs' 
expression profile, which can be followed up by bio- 
informatics to identify RNAs whose structures meet 
the miRNA criteria. This approach is significantly 
faster than the classic forward genetics used to iden- 
tify novel miRNAs. Forward genetics was used to 
discover the first known miRNA, lin-4, in C. elegans 
in 1993 [21]. The advantage of directional cloning is 
that it can be applied to any organism even when 
little or no genomic information is available. With 
the advance of next-generation sequencing (NGS), 
deep sequencing has been also used to discover 
miRNAs systematically at a phenomenal rate [28, 
31], and predicted miRNAs from deep sequencing 
have been incorporated into miRNA databases [23]. 

These biological approaches to miRNA discovery 
have complemented discoveries made through com- 
putational approaches, which predict miRNA from 
genomic DNA sequence. Collectively very large 
number of miRNAs have been identified and pre- 
dicted in a very short time frame [24, 32] . The latest 
miRBase, release 19, contains 21 264 hairpin precur- 
sor miRNAs, expressing 25 141 mature miRNA 
products, in 193 species [23]. Each upgrade refines 
the prediction continually. Compared with release 
18, miPJBase were added 3171 more new hairpin 
sequences and 3625 novel mature products, while 
over 130 misannotated and duplicate sequences 
have been deleted. This success in miRNA discovery 
has rapidly led to an even more daunting challenge 
in functional annotation, or in other words, what are 



these molecules doing in cells and what are the 
functional imphcations for their dysregulation in 
pathophysiology of diseases? While these questions 
have also been addressed both biologically and com- 
putationally, the sheer magnitude of this task particu- 
larly from an empirical perspective has driven 
significant development in the bioinformatics of 
miRNA-target prediction and systems-based analysis 
of miRNA function. 



miRNA-TARGET PREDICTION 

In the absence of high-throughput biological 
approaches to identify miRNA targets, many com- 
putational methods, such as miRanda [33], mirSVR 
[34], PicTar [35], TargetScan [36], TargetScanS [37], 
RNA22 [38], PITA [39], RNAhybu-d [40] and 
DIANA- microT [41], were developed relatively 
quickly to identify putative miRNA targets. In 
most cases, these algorithms were developed in con- 
junction with a Hmited amount of empirical evi- 
dence fi-om a few experimentally validated target 
sites for a small selection of miRNAs [42]. 

miRNAs target niRNAs through complementary 
base pairing, in either complete or incomplete fash- 
ion. It has been generally believed that miRNAs 
bind to the 3'-UTRs of the target transcripts in at 
least one of two classes of binding patterns [17]. One 
class of target sites has perfect Watson— Crick com- 
plementarity to the 5'-end of the miRNAs, referred 
as 'seed region' which positions at 2—7 of miRNAs. 
The seed region has been shown that it is sufficient 
for miRNAs to suppress their targets without requir- 
ing significant further base pairings at the 3'-end of 
the miRNAs. On the contrary, the second class of 
target sites has imperfect complementary base pairing 
at the 5'-end of the miRNAs, but it is compensated 
via additional base pairings in the 3'-end of the 
miRNAs. However, the 3'-UTR boundaries are 
not clearly defined in many species and it is stiU an 
ongoing project to characterize the location, extent 
or splice variation of 3'-UTRs in a variety of species 
[18]. In addition, it has been demonstrated that a 
transcript can contain multiple target sites for a 
single miRNA and a transcript can have target sites 
for several miRNAs. The multiple-to-multiple rela- 
tions between miRNAs and mRNAs lead to the 
even more complex miRNA regulatory mechan- 
isms. Regardless of the binding sites, the short 
length of miRNAs lacks the power to be detected 
significantly by most statistical techniques in standard 
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sequence analysis, such as Karlin— Altschul statistics 
[43]. Therefore, most algorithms apply the 
cross-species conservation requirement to reduce 
the number of false positives, despite some risk of 
increasing false negatives as some miRNAs, such as 
miR-430 [44], lack conserved targets. Overall, the 
complex features of miRNA pose great challenges 
on the computational approaches for miRNA- 
target prediction. 

Different miRNA-target prediction algorithms 
predict targets with different techniques and criteria 
including base pairing, target accessibility and evolu- 
tionary conservation of target site. Table 1 gives 
some basic features of the selected miRNA-target 
prediction methods, while the comprehensive review 
for these methods can be found in [33, 45—47]. 

The binding patterns and criteria used by 
miRNA-target prediction algorithms generally have 
great influence on the outputs and perfomiance of 
different algorithms. A small variation in the criteria 
for selection can lead to large discrepancy in the pre- 
diction [33]. However, each approach produce 
widely different lists of predictions with significant 
false-positive and false-negative rates [48]. In a 
study using mass spectrometry to measure the 
global impact of deletion of a single miRNA, hun- 
dreds of proteins were found to respond, but the best 
performing target prediction algorithms, TargetScan 
[36] and PicTar [49], which restrict their predictions 
primarily to conserved sites in 3'-UTRs, were re- 
ported to nevertheless have false-positive rates of 
~68% [50]. Although the false-positive rate might 
not be accurate because this work did not take into 
account the indirect influence on proteins by knock- 
ing down a miRNA, it demonstrated biologically 
that all target prediction algorithms suffer high 
false-positive errors. 

Another assessment was conducted by Alexiou 
et al. [51] who compared the performance for eight 
widely used target prediction programs, including 
EIMMo [52], miRanda [33], miRBase [53], PicTar 
[49], PITA [39], RNA22 [54] and TargetScan 5.0 
[36], for the human and mouse genome, using ex- 
perimentally validated targets in Selbach et al. [55]. 
They found these algorithms have a precision of 
~50% with a sensitivity that ranges from 6 to 12%. 
These assessments arrived at similar while different 
conclusions. One possible explanation could be 
that the selected benchmark sets favour certain algo- 
rithms while underestimating others, particularly 
when the number of validated miRNA targets is 



stiU relatively small. However, it highlights the prob- 
lem of validating miRNA-target interactions at a 
scale that is insufficient to provide an un-biased as- 
sessment of the performance of miRNA-target pre- 
diction algorithms. 

Despite their limitations, these programmes have 
been broadly adopted and their prediction of 
miRNA targets in a broad spectrum of species has 
been prepared for genome data and stored in data- 
bases for download or query by biologists or medical 
scientists, in some cases with very limited under- 
standing of how they were derived. Some of these 
databases provide additional features that provide 
some further insight into the strength and conserva- 
tion of the putative interaction. For example, 
TargetScan searches for the presence of conserved 
Bmer and 7mer sites that match the seed region of 
each miRNA as well as predicts non-conserved sites. 
Since version 6.0, TargetScan has extended context 
score contributions to include seed-pairing stabihty 
and target-site abundance and all 3'-UTRs from 
RefSeq rather than just the longest UTR from 
each gene. 

Recently, some novel biochemical approaches 
have been developed for miRNA-target identifica- 
tion, providing an extensive insight into the 
miRNA-binding sites. For example, Chi et al. [56] 
developed a technique, known as high-throughput 
sequencing of RNA isolated by crosslinking immu- 
noprecipitation (HITS-CLIP), to identify direct 
miRNA targets. This technique was applied to 
mouse brain [56] and C. elegans [57]. Consequently, 
compelling data have been generated on the location 
of miRNA-binding sites within both the 3'-UTR 
and coding region, allowing the genome-wide inter- 
action maps for specific miRNA to be depicted with 
a high specificity and low false discovery rate com- 
pared with previous computational methods [56]. A 
modified HITS-CLIP, termed photoactivatable 
ribonucleoside-enhanced crosslinking and immuno- 
precipitaion (PAR-CLIP) [58], is able to deliver 
more efficient ultraviolent crosslinking which in 
turn improves RNA recovery up to 1000-fold com- 
pared to its predecessor. It can also achieve more 
precise localization of binding sites between the 
RNA and protein [58]. 

By analysing HITS-CLIP/PAR-CLIP data, alter- 
native modes of miRNA-target recognition have 
been identified. EUwanger et al. [59] demonstrated 
that most conserved miRNAs interact with target 
sites endowed with short seed matches (6mer seeds) 



Identifying miRNAs, targets and functions 



5 



E 



0 

Q 



i; e o 



2- -a 



< 
Z 



< 

.y z 
S 9^ 



5 ™ 

E S S •S 

8 § < -S 

<i) u Z o 

c ^ ¥ E 

s s E ^ 

cr o ■> 

QJ ■— (1) ^ 



o 2 a. ^ 
.E £ Z ^ 



< 
Z 



11 



< 

Z 



< 

z 



S- £ 



O 



-c < 
Z 



Q 



5P E 3 - -o 



Q 
U 



Si D 



u 

Q 
U 



< 



< 

z 



< 
z 



E 
< 

z 
< 

Q 



O 
U 



6 



Liu et al. 



and a substantial fraction (40%) of all functional 
target sites are not conserved. In contrast, common 
miRNA-target prediction algorithms focus mainly 
on conserved seed of length seven or eight. 
Furthermore, Chi et al. [60] demonstrated that over 
15% of Ago— miRNA interactions with G-bulge sites 
in mouse brain cannot be explained by canonical 
seed match, suggesting a novel mode of miRNA- 
target recognition. These analyses provide a qualita- 
tive change in our understanding and assessment of 
miRNA— niRNA regulation, which in turn may 
transform the miRNA prediction algorithms in the 
near future. 



INFERRING miRNA FUNCTIONS 

As many miRNAs have been identified, and a large 
number of miRNA targets have been predicted, 
research has quickly shifted to inferring miRNA 
functions, which generally include functional anno- 
tation and inferring miRNA regulatory mechanisms 
in specific biological conditions. We will review the 
methods of inferring miRNA functions in this 
section. 

miRNA functional annotation 

The most straight-forward approach of miRNA 
functional annotation is through functional 
enrichment analysis using the miRNA-target genes 
(Figure 1). This approach assumes that miRNAs have 
similar functions to their target genes given a large 
amount of knowledge of genes have been accumu- 
lated in the last few decades. Therefore, it is practical 
to assign the functions, which are significantly en- 
riched with the targeted genes, to miRNAs. When a 
list of miRNA targets is available well-developed 
gene functional annotation resources such as 
DAVID [61] and WebGestalt [62] can be easily 
used to assign the functions of the target mRNAs 
to the group of miRNAs. Functional annotation 
for miRNAs gives great insights into the general 
functions of miRNAs. For the first time, enabled 
the mysteries of miRNA functions to be revealed 
in large scale. 

Similarly, miRGator [63], miRDB [64], miRo 
[65], MAGIA [66] and FAME [67] have been de- 
veloped with target prediction an built-in functional 
annotation. These are freely accessible databases with 
user-friendly web interfaces providing miRNA func- 
tional annotation with the similar strategy (Table 2). 
miRGator [63] infers miRNA functions from a list of 



target genes predicted by miRanda, PicTar and 
TargetScanS. As an option, the list of target genes 
can be the union or intersection of prediction from 
these three programmes. Statistical enrichment test of 
target genes in each term is carried out for Gene 
Ontology (GO), pathway and disease annotations. 
miRDB [64] uses MirTarget2 [68] for miRNA- 
target prediction using a machine learning method 
(support vector machines), with public microarray 
data sets. This method also adopts the Wiki model 
for functional annotation, which is an open environ- 
ment allowing anyone with internet access to make 
contributions. miRo [65] associates miRNAs with 
phenotypes by integrating miRNA annotation and 
target databases, such as miRBase, miRNA Atlas, 
TargetScan, PicTar and miRecords, with multiple 
online biological knowledge databases including 
Gene and Nucleotide Database (GND, http:// 
www.ncbi.nlm.nih.gov), GO and Genetic Associa- 
tion Database (GAD, http://geneticassociationdb. 
nih.gov, a database of human genetic association stu- 
dies of complex diseases and disorders). Validated 
data are highlighted in the databases indicating the 
most significant associations. MAGIA [66] supports 
multiple algorithms of miRNA-target prediction, 
such as miRanda, PITA and TargetScan, and mul- 
tiple statistical methods to infer the relationship be- 
tween miRNA— niRNA pairs and biological 
processes or diseases. Different from other methods, 
FAME [67] annotates miRNA functions by 
incorporating the expression profiles of miRNAs/ 
niRNA with the miRNA-target prediction. It uses 
a co-expressed subset of miRNA-target genes, 
which were considered to be the designated target set 
based on the parameters extracted from TargetScan, 
such as context score, for functional enrichments. 
Few other tools, such as miR2Disease [69] and 
miReg [70], manually curate annotation of 
miRNA functions and focus on the association 
with human diseases based on literature. They have 
a very limited scale due to the current knowledge 
about miRNAs. All these tools are similar in the 
strategy for functional annotation while they differ 
in the databases involved and enrichment methods 
used. 

miRNA functional annotation heavily relies on 
the miRNA-target prediction as most of the appro- 
aches are based on the predicted targets. As discussed 
above, the target prediction varies greatly among dif- 
ferent algorithms and with even a small change to 
parameters used by the algorithms. Furthermore, 
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A set of miRNAs 



1 




miRNA target prediction softwares 
(miRanda, mirSVR, PicTar, TargetScan, 
TargetScanS, RNA22, PITA, DIANA 
microT, RNAHybird, ...) 


/ miRNA target Databases / 
1 (Microcosm Targets, microRNA.org, ; 
1 TarBase, PicTar, TargetScan, RNA22, 1 
\ PITA, RNAHybird, ...) \ 









A set of miRNA targets 
(predictions from one software/ 
database, or union or Intersection of 
predictions from multiple software/ 
databases) 




Knowledge database 
(GO, KEGG, BloCarta, 

Reactome, GND, 
GAD, Disease 

Ontology, PPI, ...) 



Figure I: A framework of miRNA functional annotation. 



some evidence had shown that miRNAs may target 
mRNA outside of the 3'-UTR. Mature miRNAs 
can alter the expression of genes by binding to the 
5'-UTR [71, 72]. Other regions, known as extended 
seed and delta seed regions, also contribute to the 
target selection [73]. Obviously, most prediction al- 
gorithms will miss those targets because they focus 
on the 3'-UTR [17]. Moreover, it is possible that the 
target sites for different miRNAs in the same 3'- 
UTR indicate that the mRNAs are regulated by 
tissue-specific or development-specific miRNAs 



[17]. It is not reasonable to group them together 
for the functional annotation. Therefore, more 
sophisticated methods are expected to infer 
miRNA functions. 

Inferring miRNA regulatory 
mechanism 

In order to gain global and yet specific insights into 
the functions of miRNAs in a broad layer of 
post-transcriptional control, methods beyond 
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lable 2: Tools for miRNA functional annotation 



Tool 




Use 

CAUr 1 




Link 


■AC Id CII^C 


DAVID 


N/A 


No 


GO, KEGG, BioCarta, GAD, OMIM 


http://david.abcc.ncifcrf.gov/ 


1611 








Disease, PPI, etc. 






WebGestalt 


N/A 


No 


GO, KEGG, IPI, Pathway Commons, 


http://bioinfo.vanderbilt.edu/ 


[62] 








Wikipathways, MGI, SGD, MSigDB, 


webgestalt/ 










NCBI dbSNP 






mi RGator 


miRanda, PicTar and 


No 


GO, KEGG/GenMapp/BioCarta, 


http://genome.ewha.ac.kr/ 


[63] 




TargetScanS 




Disease Ontology 


mi RGator/ 




miRDB 


MirTargetl 


Yes 


Wiki model 


http://mirdb.org/miRDB/ 


[64] 


miRo 


miRanda, PITA and 


Yes 


NCBI Gene Database, NCBI 


http://ferrolab.dmi.unict.it/ 


[651 




TargetScan 




Nucleotide Database, GO, GAD 


miro/ 




MAGA 


miRanda, PITA and 


Yes 


Through DAVID APIs 


http://gencomp.bio.unipd.it/ 


[66] 




TargetScan 






magia/start/ 




FAME 


TargetScan 


Yes 


Experimentally verified 


http://acgt.cs.tau.ac.il/fame/ 


[67] 








miRNA-pathway and 












miRNA-process associations 






miR2Disease 


N/A 


No 


Manually curated database containing 


http://www.mir2disease.org/ 


[69] 








1939 relationship between 299 












human miRNAs and 94 human 












diseases 






mi Reg 


N/A 


No 


Manually curated database containing 


http://www.iioab-mireg. 


[70] 








47 human miRNAs, 85 proteins, IIS 


webs.com/ 










upstream regulators, 165 targets, 38 












diseases, 295 reactions and 70 












biological processes 







Not applicable, N/A. 

searching for the base pairing between miRNAs and 
niRNAs have been proposed. In the last few years, 
many studies have been conducted to infer the 
miRNA regulatory mechanisms by incorporating 
target prediction with other genomics data, such as 
the expression profiles of miRNAs and mRNAs. 

Largely, the inference of miRNA regulatory 
mechanism can be regarded as a question of data 
integration in the functional analysis of miRNAs. 
Few data sets are available for inferring miRNA 
regulatory mechanism, such as (i) miRNA-target in- 
formation from target prediction algorithms or 
miRNA-target databases, (ii) sample-matched ex- 
pression profiles of miRNAs and mRNAs from 
microarray experiments or NGS techniques and 
(iii) biological conditions or diseases related with dif- 
ferent samples. 

miRNA-target information from target prediction 
algorithms or miRNA-target database provides a 
way to build the potential relationships between 
miRNAs and mRNAs. Several miRNA-target data- 
bases, such as TargetScan [36], PicTar [35], TarBase 
[74], miRecords [75] and miRWalk [76], store com- 
putationally predicted miRNA targets as well as few 
biologically validated ones. The miRNA-target in- 
formation is usually presented as a table where each 



row indicates a target pair of miRNA and mRNA. 
Besides miRNA and its target mRNA, other infor- 
mation, such as the sequence of miRNA/ mRNA, 
the binding score between miRNA and mRNA and 
number of conserved and non-conserved sites, may 
be also presented in each row depending on the 
miRNA-target prediction algorithms used. 

For the expression profiles of miRNAs/niRNAs, 
they are usually organized as 2D tables where the 
columns are samples from different biological condi- 
tions and the rows are miRNAs/mRNAs. Each cell 
of the table is an expression value of certain 
miRNA/ mRNA in a sample either from microarrays 
or estimated from NGS techniques. Microarray tech- 
nology is a powerful method for routine studies of 
selected target sequences, while NGS data enable a 
more detailed inspection on gene diversity because it 
allows wider applications as well as provides better 
sensitivity, accuracy and dynamic range than micro- 
arrays. It is worth noting that in general there is good 
concordance between the platforms [77, 78] particu- 
larly in terms of the biological interpretation [79] . In 
general, the expression profiles are subjected to a 
series of pre-processing, such as background subtrac- 
tion and nomiahzation, before they can be further 
used for a variety of downstream analysis. 



Identifying miRNAs, targets and functions 



9 



Each sample is usually related with a biological 
condition, such as cancer or normal, which provides 
the class information when inferring the feinction of 
miRNAs. Microarray experiments and NGS are 
commonly designed in a comparative fashion in 
which the samples are extracted from different bio- 
logical conditions in order to find biological differ- 
ences among conditions. It is critical information for 
guiding inference of miRNA functions. This infor- 
mation is usually a class label tagged to each sample. 

Depending on what information is involved, we 
classify the computational methods of inferring 
miRNA regulatory mechanisms into two categories: 
(i) predicting MRMs, that is, to identify a group of 
co-expressed miRNAs and mRNAs, either at the 
sequence level [80] or by integrating sequence and 
expression profiles of miRNAs and mRNAs [81—84] 
and (ii) inferring functional miRNA— niRNA regu- 
latory modules (FMRMs), which are regulatory net- 
works of miRNAs and their target mRNAs in 
specific biological processes [85—88]. MRMs suggest 
a broader control of miRNAs to mRNAs in terms of 
general functions, while FMRMs focus on more 
detailed miRNA regulatory mechanism in specific 
biological conditions. Figure 2 illustrates the general 
framework of inferring MRMs/FMRMs. 

Predicting MRMs 

The first attempt of predicting MRMs was con- 
ducted by Yoon and De Micheli [80] who modelled 
the miRNA co-target relationship with the graph 



theory. In this approach, a MRM is defined as a 
special bipartite graph, named bichque, where two 
sets of nodes are connected by edges. Every node of 
the first set representing miRNA is connected to 
every node of the second set representing mRNAs 
by edges with similar weights. The weights of edges 
correspond to the miRNA— mRNA binding strength 
inferred from target prediction algorithms, such as 
the methods described in Lewis et al. [36] and John 
etal. [33] where the strength of miRNA-target bind- 
ing can be quantified. The biological obsei"vation, in 
which the strength of each binding is not too strong 
or weak but modest and similar when multiple bind- 
ing sites exist on a target from Lai [89], is fomiulated 
in the method. Potential miRNA-target relation- 
ships are first constructed as weighted bipartite 
graphs based on the sequence binding between 
miRNAs and mRNAs. Then, a graph-mining 
method is proposed to discover bicliques in which 
all the edges have similar weights in the given 
bipartite graphs. Statistically significant MRMs are 
selected by calculating the probability of finding a 
module by chance. This is the first method that ex- 
plicitly searches for the multiple-to-multiple rela- 
tionships among miRNAs and their target genes. 
The limitation is that it models the miRNA 
co-target relationship at the sequence level only, 
thus the nuRNA regulatory patterns at the expres- 
sion level are not characterized. 

Recent methods have integrated the analysis of 
expression profiles of miRNAs and mRNAs in 
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Figure 2: A framework of inferring MRMs/FMRMs. 
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conjunction with the predicted miRNA targets. 
Most of the integrative methods of MRM discovery 
are based on the assumption that miRNA negatively 
regulate their target mRNAs to the effect that an 
inverse relationship should exist between the expres- 
sion of a specific miRNA and its targets. 

Huang etal. [81, 90] applied the Bayesian network 
(BN) parameter learning to infer miRNA— mRNA 
interactions using both the miRNA— mRNA 
sequence binding information and the sample- 
matched expression profiles of miRNAs and 
mRNAs. An initial network representing the puta- 
tive target relationships between miRNAs and 
mRNAs is constructed according to the target infor- 
mation predicted for sequence binding. Then the 
observation, miRNAs down-regulate their target 
mRNAs, is encoded in the network at the expression 
level. It models the expression of a mRNA, which is 
assumed to foUow a Gaussian distribution, as the 
negative of a sum of weighted expression values of 
their regulator miRNAs. The Gaussian BN param- 
eter learning is used to infer the likeUhood of 
miRNAs regulating target mRNAs at the expression 
level. This method explicitly encodes the inverse ex- 
pression patterns between miRNAs and their target 
mRNAs in the interaction network. Furthermore, 
this model searches for co-expressed miRNAs and 
mRNAs which are presumed to function together. 
Thus, this model can potentially detect co-functional 
miRNAs and mRNAs besides refine miRNA-target 
predictions at the expression level. 

Joung etal. [82] proposed a probabilistic method to 
integrate the miRNA-target binding infomiation 
and expression profiles of miRNAs and mRNAs 
for MRMs. In this method, a MRM is defined as a 
group of miRNAs and mRNAs with coherent ex- 
pression patterns in terms of 3D, miRNA— mRNA, 
miRNA-miRNA and niRNA-niRNA, across all 
biological conditions. Two heterogeneous data 
sources, miRNA-target prediction scores based on 
target binding and the sample-matched expression 
profiles of miRNA and mRNA, are integrated to 
extract the coherent miRNA— mRNA modules. To 
describe the coherence of miRNA— mRNA modules 
in the above 3D, the means of Pearson's correlation 
coefficients between all miRNA or mRNA pairs are 
aggregated with the mean binding scores of all 
miRNA— mRNA pairs. It is characterized by a fitness 
function in which a co-evolutionary learning and 
estimation-of-distribution algorithms are used to 
mine the optimal groups of miRNAs and mRNAs, 



which give the best fitness scores in an iterative fash- 
ion. Thus, this method allows detection of correlated 
miRNA— mRNA modules from multiple data 
sources by using a balanced fitness function. It was 
demonstrated with a human cancer data set and two 
miRNA— mRNA modules found, which are highly 
correlated with respect to their expression and bio- 
logical functions. 

Tran et al. [83] proposed a rule-based learning 
method to predict MRMs. It is based on an assump- 
tion that genes regulated by the same miRNAs show 
similar expression profiles. This method first utilizes 
the miRNAs and their targets, which were predi- 
cated by PicTar, to construct miRNA— mRNA rela- 
tionships at the sequence level. This target 
relationship is denoted as a target binary matrix 
where rows are mRNAs and columns are 
miRNAs, and the element of the matrix is 1 if the 
miRNA in the column targets the mRNA in the 
row, otherwise 0. For each mRNA expression 
value in the given data set, this method calculates 
the Pearson's correlation coefficients between it 
and every other mRNA. The level of correlation is 
then denoted as 'Similarity' or 'Dissimilarity' using a 
pre-set arbitrary threshold. This similarity informa- 
tion is added to the target binary matrix as an extra 
column to construct a regulatory decision table, 
which is then fed into a CN2-SD [91], a rule induc- 
tion method. The CN2-SD searches the regulatory 
decision table for a set of 1 in the columns with 
'Similarity' denoted at the rows. That is, a group of 
correlated mRNAs co-targeted by a group of 
miRNAs. The Pearson's correlation coefficients are 
further calculated for miRNAs output from 
CN2-SD, and only highly correlated miRNAs are 
maintained for final MPJMs. This procedure is re- 
peated for every mRNA to find all MRMs in the 
given data sets. This method was demonstrated with 
a public data set. Several MRMs with high correl- 
ation in expression patterns of miRNAs and mRNAs 
were found. They also showed that the mRNAs 
included in the same modules share similar biological 
functions. However, it is not completely true that 
genes regulated by the same miRNAs show similar 
expression profiles, which was used as the basic as- 
sumption for this method. Thus, it can be misleading 
and many potential MPJMs may be missed. 

Peng etal. [84] developed an approach from both 
Yoon's [80] and Tran's [83] methods. In this work, 
MRMs are bicliques where miRNAs co-target 
mRNAs predicted at the sequence level while the 
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miRNAs and niRNAs are negatively correlated at 
the expression level. At the expression level, 
pair-wise correlations between the differentially ex- 
pressed miRNAs and niRNAs are calculated with 
Pearson's correlation across all matched samples. A 
correlation threshold is determined by a desired false 
detection rate, which is the percentage of miR.NA— 
niRNA pairs out of the total number of selected 
pairs that would have the same or better correlation 
just by chance. By applying this threshold, a 2D cor- 
relation matrix is constructed where elements stand 
for the miRNAs in columns negatively correlated 
with the mRNAs in rows. In parallel, for the same 
set of miRNAs and mRNA, a 3D miRNA-target 
matrix is built by examining if miRNA— mRNA 
pairs match in the seed region. Then, a miRNA— 
mRNA regulatory matrix is constructed by multi- 
plying the binary correlation matrix with the 
miRNA-target matrix in the dot product fashion. 
The miRNA— mRNA regulatory matrix is further 
represented as bipartite graphs in which a fast search- 
ing algorithm is used to enumerate all the bicliques. 
The statistically significant bicliques are the final 
MRMs through a permutation test. This method 
was applied to a data set of human liver biopsy sam- 
ples for hepatitis C virus study and identified 38 
MRMs that were associated with the hepatitis C 
virus infection. 

Recently, Zhang etal. [92] proposed a method to 
infer MRMs by integrating miRNA-target predic- 
tions, expression profiles of miRNA and mRNA 
and the topological structures of protein— protein 
interactions (PPIs). This method uses the miRNA- 
target predictions as the basic structures, while the 
expression profiles of miRNA and mRNA are applied 
on the structures to find the co-expressed miRNAs 
and mRNAs, then PPIs are further used to refine the 
structures. A novel machine learning method sparse 
network regularized multiple non-negative matrix 
factorization (SNMNMF) was developed in this 
work to integrate three heterogeneous data sources. 
They tested this method on a data set of ovarian 
cancer samples, and 49 significant MRMs were iden- 
tified, where the miRNA modules are enriched with 
miRNAs clusters in their chromosomal locations and 
the gene modules are enriched with known function 
gene sets. 

The above methods (Table 3) aim at exploring 
general miRNA— mRNA regulatory modules by 
integrating miRNA-target prediction on sequence 
with expression profiles of miRNA and mRNA or 



other data sources. They have archived variant suc- 
cesses on different trial data sets. However, they 
identify groups of co-expressed miRNAs and 
mRNAs without considering the biological condi- 
tions of the samples. Therefore, no implications re- 
garding the functions of MPJMs in specific biological 
conditions can be identified. The functions of MRMs 
in terms of biological processes usually are unclear 
until a functional enrichment analysis is conducted 
by querying the identified target genes against the 
GO or other similar annotation databases [80, 83]. 
Those biological conditions are very important in 
biological experimental design, and hence, some 
conditionally related MRMs may be omitted if we 
do not take into account the conditions. This ques- 
tion, however, is of great interest in understanding the 
biological pathways of MRMs in more detail. 

Inferring FMRMs 

In order to resolve the limitation of MPJMs and gain 
understanding of miRNA functions in specific bio- 
logical processes, the concept of FMBJMs [85] was 
proposed. FMRMs explicitly characterize how 
groups of miRNAs regulate their target mRNAs 
and how they co-act together to fomi pathways in 
complex regulatory networks for specific conditions. 
Many methods have been proposed to infer FMRMs 
(Table 4) since then. 

Liu et al. [85] proposed an approach to infer 
FMRMs by combining graph theory and association 
rule mining. This method consists of two steps at 
both sequence and expression levels. At the sequence 
level, putative miRNA regulatory networks are con- 
structed as bipartite graphs where a connection is 
made between a specific miRNA and its predicated 
target mRNAs based on the miRNA-target predic- 
tions algorithms or target databases. A fast biclique 
searching algorithm, modular input consensus algo- 
rithm (MICA) [93], is then applied to enumerate all 
bicliques given the bipartite graphs. At the expression 
level, association rule mining is used to discover the 
significant associations between specific biological 
conditions and the inverse expression patterns of 
miRNAs and mRNAs on all enumerated bicliques. 
Finally, the association relationships among 
miRNAs, mRNAs and conditions are merged to 
be the final FMRMs. This method was demonstrated 
on a publicly available prostate cancer data set where 
two modules are identified. Those modules are 
associated with cancer and nomial conditions. 
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lable 3: Summary of methods for inferring MRMs 
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Differential 
gene analysis 
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respectively. This study was the first published work 
to explicitly discover FMRMs. 

Joung etal. [86] used a generative model to predict 
FMRMs adopted from the Author-Topic model 
[94] which was proposed initially in information re- 
trieval. This method models the miRNA— mRNA 
regulatory mechanism as hierarchical steps in which 
the FMRMs are defined as functional clusters of 
miRNAs and target mRNAs involved in the same 
biological processes. In this generative model, the 
expression value of an mRNA is regarded as the 
number of times an event of the mRNA expressed 
in a sample. Each mRNA has events of its expression 
in a specific condition that is likely to be associated 
with its regulator miRNAs given by miRNA-target 
predictions. A hierarchical generative process hy- 
pothesizes that a miRNA is sampled from a multi- 
nomial distribution over FMRMs, and then the 
sampled miRNA is used to sample mRNAs which 
have a multinomial distribution over conditions. An 
approximate method, Gibbs sampling, is used to 
infer the parameters of the generative model. 



which can characterize FMRMs. This method inte- 
grates data sets, including miRNA-target informa- 
tion and expression profiles of mRNAs. It 
predicted several biological processes related to 
miRNA— mRNA modules on an Arabidopsis data 
set. The drawback of this method is that it does 
not use the expression profiles of miRNAs. Thus, 
the regulatory relationships of miRNAs and 
mRNAs largely rely on the miRNA-target informa- 
tion predicated at the sequence level. 

Liu et al. [87] proposed a BN-based method to 
identify complex miRNA— mRNA interactions for 
FMRMs, named Bayesian network with splitting- 
averaging (BNSA). They demonstrated that the 
conventional BNs are not able to identify all the 
interactions potentially existing in the data. Thus, 
BNSA was proposed to discover all possible 
miRNA— mRNA interactions, including the subtle 
ones undetectable for conventional BNs. This 
method integrates miRNA-target infomiation, sam- 
ple-matched expression profiles of miRNA and 
mRNA, and sample categories. In order to capture 
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lable 4: Summary of methods for inferring FMRMs 



Method 


Data sources 


miRNA-target 
database use 


Differential 
gene analysis 


Key features 


Availability of software 


Reference 


Liu et af. 


miRNA-target predictions, 
expression profiles of miRNA 
and mRNA, and sample 
information 


Any 


Yes 


A rule-based method; searching 
for bicliques with inversed 
miRNA-mRNA pairs 
associating with biological 
conditions. 


Upon request 


[85] 


Joung et al. 


miRNA-target prediction and 
expression profiles of mRNA 


Any 


No 
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to identify the co-expressed 
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[86] 


Liu et af. 
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A BN structure learning-based 
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network. 
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[87] 
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Bonnet et al. 
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No 
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regulation programs. 


LeMoNe. http://bioinfor- 
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[97] 


Liu et af. 


Expression profiles of miRNA 
and mRNA with or without 
miRNA-target predictions 


Any 


No 


A probabilistic graphical model 
based on Corr-LDA. 


Corr-LDA, upon request 


[88] 



Not applicable, N/A. 



all possible interactions, this method groups expres- 
sion profiles of niiRNAs and niRNAs together 
according to their sample category and then leams 
BN structures on the expression profiles of miRNA 
and mRNA in each category, respectively. The 
miRNA-target infomiation acts as a constraint to 
guide the structure learning, whereby the miRNAs 
represent the parent nodes while the mRNAs are the 
descendant nodes. The edges linking the parent 
nodes to descendant nodes can only be those defined 
in the miRNA-target predictions. Interaction net- 
works learned on each category are then integrated 
by BN averaging procedure. To avoid statistically 
insignificant results due to the small size of data 
sets, it uses bootstrapping to achieve reliable infer- 
ence and integration. This method was demonstrated 
on NCI-60 data sets [95] and used to characterize the 
FMRMs towards epithelial to mesenchymal transi- 
tion (EMT). The results show that this method 
captured all possible types of miRNA— mRNA inter- 
actions, including both negatively correlated and 
positively correlated miRNA-mRNA pairs, from 
the data in terms of EMT. For the first time, it 



demonstrated that positively correlated expression 
patterns of miRNA-mRNA also widely exist 
in the data besides negatively correlated ones 
(Figure 3). Many interactions are of tremendous 
biological significance according to pathway analysis. 
Some discoveries have been validated by previous 
research, such as miR-200 family that negatively 
regulates ZEBl and ZEB2 for EMT. Some are 
consistent with the literature, and many novel inter- 
actions are statistically significant and worthy of 
further investigation and validation. 

Nunez-lglesias et al. [96] proposed a method to 
infer FMRMs by correlation tests with permutation. 
This method calculates the expression correlations 
between miRNAs and predicted target mRNAs 
with pemiutation tests, across all given samples 
(globally) and on case and control samples (locally). 
The correlation coefficients are then standardized, 
thus scores how well the miRNAs are concordant 
with mRNAs globally and locally. Then the 
miRNA-mRNA pairs are identified by searching 
for the highest difference in the scores between 
case and control sample. The identified miRNA— 
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Figure 3: A FMRM identified by BNSA from analysis of schizophrenia subjects. It shows that miRNAs may up/ 
down regulate their target mRNAs, either direct or indirect. Up-regulated miRNAs are coloured in red and 
down-regulated miRNAs are coloured in green. Up-regulated mRNAs are coloured in yellow, while down-regulated 
mRNAs are coloured in blue. 



mRNA pairs merge to be the final FMRMs that 
have the differential power in different biological 
conditions. This method again identified that both 
negative and positive correlations widely exist in the 
expression of miRNAs and their target. Functional 
analysis further suggests that positively correlated 
miRNA— mRNA pairs have equally important func- 
tions as the negative ones. 

In contrast to the above methods, Bonnet etal. [97] 
proposed to infer FMRMs using expression profiles of 
miRNA and mRNA only. Their method involves 
two steps. In the first step, it searches the mRNA 
expression profile for sets of tight clusters, which are 
groups of genes consistently clustered together under 
the different biological conditions. A Gibbs sampling 
approach is used to cluster the expression profiles in 
directions of both genes and conditions simultan- 
eously. Multiple clustering solutions are generated 
by Gibbs samplers with a range of configurations. 
Then, a set of tight clusters is produced by averaging 
multiple clustering solutions. In the second step, this 
method assigns a set of regulators including miRNAs, 
transcription factors or signal transducers, to each tight 
cluster. This assignment is learned using a fuzzy deci- 
sion tree model. In this model, the clustered condi- 
tions of each module output from the first step are first 
linked together with a hierarchical decision tree 
where each node of the tree is a spUt of two sets of 



conditions corresponding to the under- or 
over-expressed levels of mRNAs. Then, regulators 
are assigned to each node of the tree using a probabil- 
istic score reflecting how well the expression levels of 
the regulator match the mRNA expression levels 
defined by the split value. In order to avoid 
over-fitting, multiple condition clusters are gener- 
ated. Thus, there are multiple decision trees and mul- 
tiple regulators assigned for each node of each 
hierarchical tree. Finally, FMRMs are extracted by 
an ensemble approach which is used to capture the 
regulators most frequently assigned to each condition, 
given a set of tight clusters of mRNAs. The algorithm 
was initially designed to infer the gene regulatory 
modules, and the open-source software package is 
named LeMoNe. Different from inferring the gene 
regulatory modules, miRNAs are assigned as candi- 
date regulators in LeMoNe. This method demon- 
strated that FMRMs can be inferred from 
expression profiles of miRNA and mRNA only. 
Some researchers have suggested that algorithms 
that do not consider known targets, may avoid bias 
[36, 37, 98]. Thus, this method potentially avoids this 
problem incurred by miRNA-target predictions. 

A method proposed by Liu etal. [88] allows more 
flexible choices where expression profiles of miRNA 
and mRNA are used to infer FMRMs with or without 
integrating the miRNA-target predictions. This 
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approach is a probabilistic graphical model based on 
correspondence latent Dirichlet allocation (Corr- 
LDA) [99], in which FMRMs are defined as latent 
variables governing the expression values of miRNA 
and mRNA which in turn are associated with a variety 
of biological functions. Given fe-latent FMRMs pre- 
sented in the samples, this method models miRNAs 
and mRNAs as observations generated from a prob- 
abihstic process over the fe-FMRMs. Therefore, each 
sample is a random mixture of miRNAs and mRNAs 
associated with fe-modules. By inferring the probabil- 
ity distributions of the latent variables, this method 
captures the likelihood that samples, miRNAs and 
mRNAs, are associated with functional modules. A 
Gibbs sampling method was developed to infer the 
parameters of this model. Under this model, 
miRNAs can be associated with any functional mod- 
ules, while mRNAs may only be associated with the 
modules that produce the miRNAs. In effect, it cap- 
tures the hierarchical notion that miRNAs are gener- 
ated under specific FMRMs, and mRNAs are 
regulated by the miRNAs in the given FMRMs. 
This model was applied to a mouse mammary data 
set. It effectively captured several biological process- 
specific modules involving miRNAs and their target 
mRNAs. Furthermore, without using prior target 
binding infomiation, the identified miRNAs and 
mRNAs in each module show a large proportion of 
overlap with predicted miRNA-target relationships, 
suggesting that expression profiles of miRNA and 
mRNA are crucial for both target identification and 
discovery of FMRMs. 



CONCLUSIONS AND OUTLOOKS 

miRNAs have been recognized as pivotal factors in 
defining the specificity and sensitivity of post- 
transcriptional gene silencing. Identifying miRNA, 
their target genes fi-om genome and further inferring 
their functions and regulatory mechanisms are critical 
in understanding biological processes of organisms 
and may shed light on deciphering their roles in 
the pathophysiology of disease. 

While some validated miRNAs and their target 
genes have been collated in databases, such as 
TarBase [100] and miRecords [75], these in no 
way reflect the diversity and abundance of potential 
miRNA regulatory influences. It is unfeasible to ex- 
plore empirically all the possibilities in this combina- 
torial matrix due to the laborious tasks involved. As 
such, a complete understanding of miRNA functions 



and their precise regulatory mechanisms remain 
elusive. 

After many miRNAs have been identified and 
their targets have been predicted, research interests 
are moving to identify the functions of miRNAs and 
their regulatory mechanisms. However, characteriz- 
ing these aspects represents a significant challenge 
because of complex and subtle features of miRNA 
and RNA-induced silencing complex which 
miRNAs might associate with. To gain global and 
yet specific insights into the functions and evolution 
of a broad layer of post-transcriptional control, it is 
particularly useful to integrate miRNA and miRNA 
sequence and expression profiles and compare these 
data with other comparative genomic information 
[17]. High-throughput technologies, such as micro- 
array, mass spectrometry and especially the newly 
developed NGS, have provided tremendous poten- 
tial for profiling variant molecules at several levels 
with unprecedented resolution, depth and speed. 
These features of technologies bring new bioinfor- 
matics opportunities as well as challenges. 

In this review, we focused on the computational 
methods of inferring miRNA functions at miRNA— 
mRNA level and provide an introduction of miRNA 
discovery and miRNA-target prediction. The con- 
cepts applied by these methods can be largely re- 
garded as integration of heterogeneous data sources 
in functional analysis of miRNAs. Depending on the 
data sources involved, we classify these methods into 
three categories: miRNA functional annotation, 
inferring MRMs and inferring FMRMs. Several 
methods have been proposed in the last few years. 
Some methods have been released and free for use, 
while others are available on request. 

How effective these algorithms are at present is stiU 
difficult to determine with such a Hmited selection of 
data sets without extensive biological validation. 
Complex features of miRNAs make functional anno- 
tation and regulatory mechanisms even harder to 
evaluate, particularly different algorithms focus on 
slightly different aspects of miRNA— mRNA inter- 
actions. The selection of methodology wiU be de- 
pendent on the available information. If a Hst of 
miRNAs is the only available information, we are 
limited to miRNA functional annotation. If the 
miRNA-target prediction and the expression of 
mRNAs are available, then Jong etal.'s [86] method 
based on Author-Topic model is probably the most 
appropriate. If expression profiles of both miRNA 
and mRNA are available, it is possible to go further 
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and infer the biological specific FMRMs through 
BNSA [87] or Corr-LDA [88]. WhUe these methods 
are not directly comparable, they can complement 
each other. 

We are a long way from understand miRNA 
regulatory mechanisms on a large scale. The meth- 
odologies discussed in this review have the capacity 
to infer miRNA regulatory mechanism with 
miRNA-target predictions and expression profiles 
of miRNA and niRNA. Biological discovery has 
suggested that miRNA regulation can degrade 
mRNAs as well as inhibit protein translation. 
Although one-third of niRNAs repressed in the 
translation process display detectable destabilization, 
more are repressed without detectable changes in 
mRNA levels [50] . Thus, the global impact on pro- 
tein outputs had not yet been determined in great 
detail. Furthermore, transcription factors also play 
important roles in translation. They may co-regulate 
genes with miRNAs. Exploration of the wiring of 
miRNA regulatory relationships together with 
known protein— protein interaction data, phenotypic 
data, transcriptional regulatory interactions and other 
functional genomics data may help to further eluci- 
date the function of miRNAs at a system- wide level. 
Fortunately, some work falling in these categories is 
emerging [101—103]. In summary, it may be that by 
integrating genome-wide computational and experi- 
mental data we have the unprecedented opportunity 
to study functions and evolution of a broad layer of 
gene regulatory control mediated by miRNAs. 
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Key Points 

• Discovering miRNAs, identifying their targets and further infer- 
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standing normal biological processes of miRNAs and their roles 
in the development of disease. 
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• By integrating genome-wide computational and experimental 
data, we have the unprecedented opportunity to study functions 
and evolution of a broad layer of gene regulatory control 
mediated by miRNAs. 
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